Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data

نویسندگان

  • Jirí Macas
  • Pavel Neumann
  • Petr Novák
  • Jiming Jiang
چکیده

MOTIVATION Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and mapping of expressed genes, simple sequence repeats and transposable elements in centromeric regions of rice chromosomes.

The genomic sequences derived from rice centromeric regions were analyzed to facilitate the comprehensive understanding of the rice genome. A rice centromere-specific satellite sequence, RCS2/TrsD/CentO, was used to screen P1-derived artificial chromosome (PAC) and bacterial artificial chromosome (BAC) genomic libraries derived from Oryza sativa L. ssp. japonica cultivar Nipponbare. Physical ma...

متن کامل

Genomic and genetic characterization of rice Cen3 reveals extensive transcription and evolutionary implications of a complex centromere.

The centromere is the chromosomal site for assembly of the kinetochore where spindle fibers attach during cell division. In most multicellular eukaryotes, centromeres are composed of long tracts of satellite repeats that are recalcitrant to sequencing and fine-scale genetic mapping. Here, we report the genomic and genetic characterization of the complete centromere of rice (Oryza sativa) chromo...

متن کامل

Composition and Structure of the Centromeric Region

Understanding the organization of eukaryotic centromeres has both fundamental and applied importance because of their roles in chromosome segregation, karyotypic stability, and artificial chromosome-based cloning and expression vectors. Using clone-by-clone sequencing methodology, we obtained the complete genomic sequence of the centromeric region of rice (Oryza sativa) chromosome 8. Analysis o...

متن کامل

Genome-wide characterization of centromeric satellites from multiple mammalian genomes.

Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced...

متن کامل

Isolation and identification of Eurotium species from contaminated rice by morphology and DNA sequencing

30 milled rice samples were collected from retailers in four states of Malaysia. These samples were evaluated for Eurotium spp. contaminations by direct plating on malt extract salt agar (MESA). All Eurotium were isolated and identified based on morphology and nucleotide sequences of internal transcribed spacer 1 (ITS1) and ITS2 of the rDNA.  Four Eurotium species (E. rubrum, E. amstelodami, E....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 26 17  شماره 

صفحات  -

تاریخ انتشار 2010